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The existing heart failure risk prediction models are developed based on 
machine learning predictors. The objective of this study is to identify the key 
risk factors that affect the survival time of heart patients and to develop a heart 
failure survival prediction model using the identified risk factors. A cox 
proportional hazard regression method is applied to generate the proposed 
heart failure survival model. To develop the model multiple risk factors such 
as age, anemia, creatinine phosphokinase, diabetes history, ejection fraction, 
presence of high blood pressure, platelet count, serum creatinine, sex, and 
smoking history. Among the risk factors, high blood pressure is identified as 
one of the novel risk factors for heart failure. We have validated the 
performance of the model via statistical and empirical validation. The 
experimental result shows that the proposed model achieved good 
discrimination and calibration ability with a C-index (receiver operating 
characteristic (ROC) of being 0.74 and a log-likelihood ratio of 81.95 using 


11 degrees of freedom on the validation dataset. 
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1. INTRODUCTION 

Heart failure is a clinical syndrome characterized by a reduced ability of the heart to pump blood to 
other parts of the body or fill with blood [1], [2]. Heart failure leads to fatigue, shortness of breath, and poor 
quality of life. Patients with heart failure have a higher mortality rate, and various bio-statistical methods, as 
well as machine learning methods, have been applied to predict heart failure deaths from patients' medical 
records. Savasci ef al. [3], Toulni et al. [4], and Nugroho et al. [5] have conducted on the problem of heart 
failure diagnosis using different machine learning predictive models. The literature shows that the performance 
of such a predictive model is acceptable although improving the predictive performance of existing work 
remained an open research problem. For example, in Su et al. [6], the researchers conducted research on 
improving the performance of the existing heart failure prediction model using hyper-parameter tuning. 

In hyper-parameter tuning, the researchers employed a grid search method for determining better 
parameter settings for effective heart failure prediction using a machine-learning model. The experimental 
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result shows that the accuracy of the K-nearest neighbor (k-NN) improves by 8.25% with hyperparameter 
tuning when the optimal value of k is used for training the model. Overall, an accuracy of 86.46% is achieved 
using the developed model. Literature survey Sumiati et al. [7], Animesh et al. [8], and Ip et al. [9] also shows 
that different machine learning algorithms are applied to heart failure prediction. In recent years, machine 
learning is widely adopted in survival analysis [10]-[12]. In the survival analysis, machine learning is applied 
to determine the issues related to time-to-death events due to heart failure. 

The objective of this research is to propose a method for heart failure prediction using the cox 
proportionate model to investigate determinant risk factors of heart failure. This research explores the answers 
to the following research questions: i) what is the most significant risk factor for heart failure?; ii) how to 
optimize a machine-learning model for heart failure survival prediction?; iii) which covariate in the heart failure 
dataset has a significant impact on the survival rate?; iv) What is the performance of the cox proportionate 
model for heart failure survival analysis? The rest of the paper is organized as shown in: in section 2, related 
works are discussed, section 3 presents the method, section 4 discusses the result and findings of the study and 
finally, section 5 concludes the work. 


2. METHOD 

The dataset employed to conduct this study is collected from University of California Irvine (UCI) 
heart failure clinical records consisting of 299 samples of which 96 samples are caused death events and 203 
non-death events. Cox proportional hazard regression analysis [13]-[15] is employed for developing the 
proposed heart failure risk survival analysis model, which is one of the most widely, used statistical methods 
for survival analysis. This research aims to develop a heart failure prediction model using multiple parameters 
to estimate the probability of surviving heart failure in an individual. 


2.1. Survival function 

Let T denote the future life of an individual aged 0. T is a continuous random variable, which takes 
values on Rt=[0; 1). For human life calculation, the limiting age is typically 120 years, thus T falls in the range 
[0; 120]. Then the cumulative distribution function of T [16]-[19] is given by the probability of death by age t 
and is defined by the formula given in (1). 


F(t) =P(T <t) () 
The survival function of T is the probability of surviving beyond age t and is defined by (2). 
SM) =P(T >t) =1-F(t) (2) 


2.2. Exploratory data analysis (EDA) 

Exploratory data analysis is recently becoming one of the methods for exporting insight into heart 
disease datasets [20]-[25]. To explore and get insight into the dataset through understanding the effect of each 
feature on heart failure survival, the authors employed a seaborn package with a distribution plot. The effect 
of each of the risk factors in the heart failure dataset is explored for understanding the relationship between 
each feature and the target variable or the death event due to heart failure. The distribution of numerical 
covariates and their relationship to death events is demonstrated in Figures 1-5. In Figure 1, the distribution of 
age and the relationship between age and death events due to heart failure is demonstrated. Most of the death 
events occurred at the age of 60 as observed in Figure 1. 

Figure | demonstrates the distribution of age features and the relationship between age features and 
survival. We observe from Figure 1 that the highest death event occurred at the age of 60 years. In Figure 2, 
the effect of platelets on heart failure is demonstrated. Figure 3 demonstrates the distribution of creatinine 
phosphokinase and the relationship between creatinine phosphokinase features and survival. We observe from 
Figure 3 that the highest death event occurred at the age of 60 years. In Figure 2, the effect of platelets on heart 
failure is demonstrated. 

Figure 3 demonstrates the distribution of creatinine phosphokinase and the relationship between 
creatinine phosphokinase feature and survival. We observe from Figure 3 that the highest death event occurred 
at the age of 60 years. In Figure 2, the effect of platelets on heart failure is demonstrated. In Figure 3, the 
distribution of creatinine phosphokinase and the relationship with the survival of heart disease patients is 
demonstrated. As demonstrated in Figure 3, lower creatinine phosphokinase is associated with higher death. 

In Figure 4, the relationship between the distributions of serum creatinine and the survival of heart 
disease patients is demonstrated. As demonstrated in Figure 4, lower serum creatinine is associated with higher 
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death. In Figure 5, the relationship between the distributions of serum sodium and the survival of heart disease 
patients is demonstrated. As demonstrated in Figure 5, the higher serum sodium is associated with higher death. 

In addition to the graphical exploration of the numerical heart failure dataset feature (demonstrated in 
Figures 1-5), statistical summaries such as count, mean, median, standard deviation max, and min. The dataset 
contains the following potential risk factors: age, serum sodium, serum creatinine, gender, smoking, blood 
pressure (BP), ejection fraction (EF), anemia, platelets, and creatinine phosphokinase (CPK). The statistical 
summary of the dataset is demonstrated in Table 1. 
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The cox proportional hazard model is fitted on the data respecting censoring, so the model is created 
by specifying the event to predict i.e. death event due to heart failure, and the time for the occurrence or non- 
occurrence of the event is observed. In fitting the model, 203 individuals had not died at the time of their 
follow-up examination at the hospital. From the summary statistics demonstrated in Table 1, we observe that 
the mean follow-up period was 130 days, ranging from 4 to 285 days. 
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Figure 5. Serum sodium distribution and its relation with death event due to heart failure 


Table 1. The performance of the proposed model on heart failure survival analysis 


Model Cox proportionate hazard filter 
Duration Time in days (4 to 285 days) 
Event Death event due to heart failure 
baseline estimation bestow 
number of observations 299 
Number of events observed 96 
partial log-likelihood ratio test 81.95 on 11 degrees of freedom 
Concordance 0.741 


3. RESULTS AND DISCUSSION 

The cox model is implemented using the lifelines Python package. After creating the heart failure 
survival analysis model, we evaluated the model against C-index. To evaluate the performance measure the 
performance of the proposed model, the Concordance index or C-index is employed. The C-index for the 
proposed heart failure survival model is the weighted average of the area under time-specific receiver operating 
characteristic (ROC) curves, which is demonstrated in Table 1. As shown in Table 2, the concordance of the 
C-index for the proposed model is 0.74, which is a promising result. Hence, the model is effective in predicting 
the survival time of a patient suffering from heart failure. More statistical results of the proposed model such 
as p and z values are demonstrated in Table 2. 


Table 2. The logrank test performed on covariate variable 


Feature Coef exp (Coef) exp (coeflower95) exp (coefupper95) Z-value P-value 
Age 0.046 1.048 1.029 1.067 4.977 <0.0005 
Anaemia 0.460 1.584 1.036 2.423 2.122 0.034 
creatinine phospholinase 0.000 1.000 1.000 1.000 2.226 0.026 
diabetes (Nominal) 0.140 1.150 0.743 1.781 0.627 0.531 
Bejection_fraction -0.049 0.952 0.933 0.972 -4.672  <0.0005 
high blood_pressure 0.476 1.609 1.053 2.458 2.201 0.028 
platelets -0.000 1.000 1.000 1.000 -0.412 0.681 
Serum creatinine 0.321 1.379 1.201 1.582 4.575 <0.0005 
Serum sodium -0.044 0.957 0.914 1.001 -1.899 0.058 
Sex -0.238 0.789 0.482 1.291 -0.944 0.345 
Smoking 0.129 1.138 0.695 1.861 0.513 0.608 


As demonstrated in Table 2, age is the most important variable evidenced by the highest z-test. An 
increase in age will give a 1.029-fold increase in heart failure death event risk. Serum creatinine is the second 
most important variable evidenced by the 4.575 z-test value. This means, that for each unit increase in serum 
creatinine the death risk from heart failure is increased by 1.582-fold. Additionally, in figure 6, the risk factors 
for heart failure death events are demonstrated. We see from Figure 6 that high blood pressure, anemia, serum 
creatinine, age and diabetes, smoking, and age are all within the 95% confidence interval of affecting death 
event due to heart failure. We observe from Figure 6 that features such as sex, serum sodium, platelets, ejection 
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fraction and creatinine phosphokinase are all below the 95% confidence interval of affecting death events due 
to heart failure. 
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Figure 6. The logarithmic hazard rate (HR) of risk factors of death event due to heart failure 


CONCLUSION 
In this study, a heart failure risk prediction model is proposed using cox proportionate hazard. The 


proposed model is evaluated against concordance or c-index and the experimental result appears to prove that 
the model is effective to estimate the survival functions with reasonable accuracy (c-index) for individuals with 
heart failure. Overall, the proposed model will allow us to estimate how likely a person is to survive or die over 
time due to heart failure. The proposed model achieved a c-index of 0.74 on the experimental test, which is a 
promising result. 
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